Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

نویسندگان

Tyler Sorensen

Hugues Evrard

Alastair F. Donaldson

چکیده

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...

متن کامل

EffiSha: A Software Framework for Enabling Efficient Preemptive Scheduling of GPU

Modern GPUs are broadly adopted in many multitasking environments, including data centers and smartphones. However, the current support for the scheduling of multiple GPU kernels (from different applications) is limited, forming a major barrier for GPU to meet many practical needs. This work for the first time demonstrates that on existing GPUs, efficient preemptive scheduling of GPU kernels is...

متن کامل

A Fine Grained Cycle Sharing System with Cooperative Multitasking on GPUs

The emergence of compute unified device architecture (CUDA), which has relieved application developers from having to understand complex graphics pipelines, has made the graphics processing unit (GPU) useful not only for graphics applications but also for general applications. In this paper, we present a cycle sharing system named GPU grid, which exploits idle GPU cycles to accelerate scientifi...

متن کامل

Accelerating GPU Kernels for Dense Linear Algebra

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting – a set of GPU specific optimiz...

متن کامل

Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures

This paper presents new algorithmic approaches and optimization techniques for LU factorization and matrix inversion of millions of very small matrices using GPUs. These problems appear in many scientific applications including astrophysics and generation of block-Jacobi preconditioners. We show that, for very small problem sizes, design and optimization of GPU kernels require a mindset differe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1707.01989 شماره

صفحات -

تاریخ انتشار 2017

Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

نویسندگان

چکیده

منابع مشابه

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

EffiSha: A Software Framework for Enabling Efficient Preemptive Scheduling of GPU

A Fine Grained Cycle Sharing System with Cooperative Multitasking on GPUs

Accelerating GPU Kernels for Dense Linear Algebra

Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures

عنوان ژورنال:

اشتراک گذاری